Towards an encoding standard for social media and CMC: Experiences from German and French corpus projects using TEI

نویسندگان

Adrien Barbaresi

Michael Beißwenger

Eric Ehrhardt

Alexander Geyken

Marc Kupietz

Lothar Lemnitzer

Harald Lüngen

Angelika Storrer

چکیده

Format of this submission: Our proposal of a mini panel includes two papers (Beißwenger et al.) and (Chanier et al.). If accepted, we would like to introduce the panel with a little introduction (10-15 minutes) to the basics of text encoding with the TEI framework and some general challenges in modeling CMC with TEI. We would then present and discuss the two papers (= 40 minutes presentation + 20 minutes discussion). All in all, the mini panel would last 75 minutes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres

The CoMeRe project aims to build a kernel corpus of different Computer-Mediated Communication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assembled using a standard, thanks to the TEI (Text Encodi...

متن کامل

Social Media Writing and Social Class: A Correlational Analysis of Adolescent CMC and Social Background

In a large social media corpus (2.9 million tokens), we analyze Flemish adolescents’ non-standard writing practices and look for correlations with the teenagers’ social class. Three different aspects of adolescents’ social background are included: educational track, parental profession, and home language. Since the data reveal that these parameters are highly correlated, we combine them into on...

متن کامل

Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics

The CoMeRe project aims to build a kernel corpus of different computer-mediated communication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunications, as well as mono and multimodal, and synchronous and asynchronous communications. Corpora are assembled using a standard, thanks to the Text Encodi...

متن کامل

DeRiK: A German reference corpus of computer-mediated communication

The paper describes an ongoing project that aims at building a reference corpus of German computer-mediated communication (CMC) as a new component of an already existing reference corpus of written contemporary German. The ‘Deutsches Referenzkorpus zur internetbasierten Kommunikation’ (DeRiK) shall include data from the most prominent CMC genres amongst German Internet users and, thus, close a ...

متن کامل

EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora

This paper describes the goals, design and results of a shared task on the automatic linguistic annotation of German language data from genres of computer-mediated communication (CMC), social media interactions and Web corpora. The two subtasks of tokenization and part-of-speech tagging were performed on two data sets: (i) a genuine CMC data set with samples from several CMC genres, and (ii) a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Towards an encoding standard for social media and CMC: Experiences from German and French corpus projects using TEI

نویسندگان

چکیده

منابع مشابه

The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres

Social Media Writing and Social Class: A Correlational Analysis of Adolescent CMC and Social Background

Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics

DeRiK: A German reference corpus of computer-mediated communication

EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora

عنوان ژورنال:

اشتراک گذاری